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Abstract 

The present work proposes a new method for color image 
segmentation. This approach is based on the calculation 
of the interest points from the binary image to extract the 
binary mask used to separate the object from its 
background. The binarization of the image is done by 
using the local Otsu threshold method to reduce the effect 
of heterogeneity thus improving the object segmentation. 
To evaluate the effectiveness of the proposed method, we 
compared the obtained results of the object segmentation 
based on our approach with the results of the grab cut 
method. Experimental results show the robustness of our 
approach for detecting objects in real images. 


Keywords: Color image segmentation, Lab color space, 
heterogeneity, local Otsu thresholding, interest points, 
morphological processing. 

1. Introduction 

This paper addresses the problem of object segmentation 
which is a challenging task especially in the case of the 
presence of heterogeneity in the image. Segmentation 
techniques can be classified into different categories [1]: 
Threshold based, Region-based, Cluster-based and Edge- 
based. 

Image segmentation based on the thresholding is one of 
the oldest techniques. It classifies the pixels into two 
classes: pixels which have intensity value less than a 
threshold belong to one class; while other pixels belong 
to the other class [2, 3]. Region-based methods classify 
an image into different regions where pixels within each 
region are sharing similar properties according to 
predefined conditions [4]. Clustering is an unsupervised 
learning technique where the number of clusters is 
determined in advance to classify pixels and those having 
similar conditions are grouped together into one cluster 
[5]. Edge-based segmentation methods attempt to 
segment the image by detecting the edges between 
different regions [6]. These edges are detected in 
locations where a significant intensity change occurs. The 


pixels in these locations are extracted and grouped to 
highlight the edges in the image. Object segmentation 
can also be accomplished by the active contour method 
which has become very popular and widely used in 
image segmentation. This method has two approaches, 
edge-based and region-based. In the edge-based active 
contour approach [7, 8], image gradients are used to 
evolve a contour until it reaches the object boundaries; 
nevertheless such a method suffers from sensitivity to 
noise and active contour initialization. Alternatively, the 
region-based active contour approach [9, 10, 11, 12, 13, 
14] considers the information extracted from the region 
instead of the image gradients. These approaches model 
the foreground and the background regions statistically 
and deform the active contour to identify the real object 
boundaries. For both edge-based and region-based active 
contour, an energy function is minimized to allow the 
contour to deform and segment the object of interest 
when the optimum of this energy is reached. However, 
these approaches still suffer from limitations due 
essentially to parameter estimation and the position of 
the initial active contour in the image. 

Heterogeneity is almost omnipresent in real images on 
both the object and the background which hinders the 
process of the object segmentation and makes it more 
difficult. To reduce difficulties due to heterogeneity, we 
propose in this paper a new segmentation method relying 
on interest points extracted from the binary image rather 
than the real image where a local thresholding is used. 
Interest points are particularly relevant because they are 
simple and robust low-level features providing an 
efficient characterization of the object. They can be 
defined as being points in the image where significant 
changes occur, such as comers, connections, black points 
on a white background or other points marked by an 
important change of texture. In our work, we use the 
Harris detector [15] for its popularity and its satisfactory 
results in extracting the interest points. 

The paper is organized as follows. Thresholding is 
defined in section 2. The Harris detector is described in 
section 3. Morphological operations are defined in 
section 4. We present the proposed object segmentation 
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approach in section 5 and experimental results in section derive two probability distributions, one for the object (f) 
6. We conclude the paper in section 7. and the one for the background (b), given by: 


2. Thresholding 

The purpose of thresholding methods is the reduction of 
unnecessary information in the image leaving only the 
useful one for further processing. It is the process of 
classifying the image into two regions corresponding to 
the object and its background using an optimum 
threshold value. We categorize the thresholding methods 
into two groups: global and local methods. The global 
methods use a single global threshold value to segment 
an image into two classes such as the one proposed by 
Otsu [16] and Kapur and al. [17], whereas local methods 
estimate a threshold value for each pixel according to the 
information extracted from the neighboring pixels such 
as range, variance or surface-fitting parameters. The 
techniques of Bemsen [18], Chow and Kaneko [19], 
Eikvil [20], Mardia and Hainsworth [21], Niblack [22], 
Taxt [23], Yanowitz and Bruckstein [24] and Sauvola 
and Pietikainen [25] belong to this category. 


p f = 2>i . Pb = Z Pi ( 4 ) 

i=l i=t+l 

Where t is the threshold 

Using the expressions of the foreground and background 
entropies defined respectively by: 



The optimal threshold is the value maximizing the 
aggregated entropy: 


T „ pl = arg max[H f + HJ. (7) 


a. Global thresholding 

Otsu’s thresholding is a clustering method which selects 
the optimal threshold by maximizing the inter-class 
variance (between class), which is same as minimizing 
the intra-class variance (within class). The within class 
variance can be formulated as: 


b. Local adaptive thresholding 

In the method from Sauvola [26] which is an 
improvement of the Niblack’ s [22] method, the threshold 
T(x,y) is calculated using the mean m(x,y) and the 
standard deviation 6(x,y) of the pixel intensities within 
a window of size w x w as: 


Within (T) = n B (T)a B (T) + n 0 (T)a 2 0 (T). ( 1 ) 

where T is the threshold value. n B (T) and n 0 (T) denote 
respectively the sum of the pixel intensities of the 
background and the object. a B (T) and cJq(T) are 
respectively the variance of the pixels in the background 
and the object. 

The between class variance is given as: 

between 0) = ^ (T)ll 0 (T) [p B (T) - R 0 (T)f . (2) 

The optimal threshold can be defined as: 

T opt = arg max(CT 2 etween (T)) = argmin(cr. lthln (T)). (3) 


where ju B and ju 0 represent respectively the mean value 
of the pixels intensities in the background and the object 
region. 

Otsu’s method gives satisfactory results when the region 
of each class is homogeneous in terms of pixel intensities; 
this method still remains one of the most popular 
thresholding methods. Based on the entropy, Kapur’s 
method perform bi-level image thresholding, in which the 
foreground and background of the image are considered 
as two different signal sources. In this way, so that when 
the sum of the two class entropies reaches its maximum, 
the image is considered to be optimally thresholded.. Let 
I be an image containing n pixels described by gray 
levels belonging to the set {0,1,... L-l } with probability 
distribution p { = p 1? p 2 ,...,p L . From this distribution, we 


T(x, y) = m(x, y) 


l + k( 


8(x,y) 

R 


1 ) • 


( 8 ) 


R is the maximum value of the standard deviation and k 
is a bias term which takes positive values in the range 
[0.2, 0.5]. 

In general, each global method can be applied in a local 
version by dividing the image into blocks and then using 
a global threshold on each block independently. Figure 1 
shows an example of image thresholding using various 
methods of binarization previously defined. 



(a) Input image 



(c ) global entropy 
thresholding 



(e) local entropy 
thresholding ( 1 80x 1 80 
pixels) 


(b) global Otsu 
thresholding 



(d) local otsu 
thresholding^ 80x1 80 
pixels) 



(f) Sauvola 

Thresholding^ 80x 1 80 
pixels) 


Figure 1. Thresholding methods. 
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From the results presented in Figure 1, it can be inferred 
that, the local Otsu’s method provides good thresholding 
compared to other local and global methods. 

3. Harris Detector 

Extracting the interest points by the Harris detector is a 
method based on the Moravec detector [26]. The method 
extracts the corners as interest points by using a 
differential method. Harris detector is based on the 
autocorrelation of the image intensity values or the image 
gradient values. The gradient covariance matrix is given 
by: 

" dT 2 

c= 3x dxdy = I x 2 

jSSE ( a.y [u y 

_dy dx dy 

where Ix and I y denote the image gradients in the x and y 
directions. The Harris detector considers the minimum 
and the maximum eigenvalues, respectively a and p, of 
the image gradient covariance matrix C in developing the 
comer detector. A ‘comer’ occurs when the two 
eigenvalues are large and similar in magnitude. Harris 
[27] proposes a measure using the determinant and the 
trace of the gradient covariance matrix defined as: 

H = ap-k(a + p) 2 = detC-k(Trace(C)) 2 . (10) 
where k belongs to the interval [0,04 0,06]. 

The pixels are classified according to the values of H 
such as: 

H > 0: Comer pixel, H ~ 0: pixel in a flat region and 
H < 0: edge pixel. 

4. Morphological operations 

Morphology operations [28] are techniques of the image 
processing which depend on the shape. The value of each 
pixel in the output image is based on a comparison of the 
corresponding pixel in the input image with its neighbors. 
The fundamental morphological operations are erosion 
and dilation. Erosion is typically applied to binary 
images, but they are versions that work on grayscale 
images. The basic effect of the operator on a binary 
image is to erode away the boundaries of the image 
foreground. The value of the output pixel is the minimum 
value of all the neighboring pixels in the input image. 
The erosion operation is defined by the function: 

f(x,y)=min {I(x,y) and its neighboring pixels} . (11) 

f(x,y) is the eroded image at pixel (x,y), and I(x,y) is 
the pixel intensity. 

Dilation adds pixels to the boundary of the regions of 
foreground pixels. Thus, areas of foreground regions 
grow in size while holes within those regions become 
smaller. The value of the output pixel is the maximum of 
all the neighboring pixels in the input image. The dilation 
is defined by: 


g(x,y)= max{I(x,y) and its neighboring pixels} . (12) 

g(x,y) is the dilated image at pixel (x,y). 

5. Proposed Approach 

In this section, we describe the proposed object 
segmentation approach. We use the interest points 
extracted from the binary image after a thresholding step 
as demonstrated in the flowchart presented in Figure 2. 
The region with the maximum number of interest points 
is selected to be the zone containing the object of interest. 
By calculating the binary mask, the object is well 
separated from its background and the object 
segmentation is achieved. The contour of the object is 
then detected on the input image. 



Figure 2. Flowchart of the proposed algorithm. 

To test the efficiency of this algorithm on real images, let 
consider the image of the boat presented in Figure 3(a). 
The binarization of the image in our work (Figure 3(b)) 
is obtained by using the local Otsu’s thresholding 
technique (180 x 180 pixels) which produces satisfactory 
results compared to the global thresholding methods as 
well as the local methods as shown in the Figure 1. In the 
local Otsu’s method, the threshold is computed 
individually for each pixel using global thresholding 
from the local neighborhood of the pixel. This operation 
classifies the image into regions (clusters) separating the 
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foreground image from the background image. The 
interest points are extracted (Figure 3(c)) from the binary 
image. These points are local maxima and differ from 
their neighbors in intensity, color and curvature direction. 
In addition, they are originally utilized to characterize the 
areas with the most visual information. The interest 
object is designed to be the most dominant region in 
terms of information and thus to be in the cluster 
containing the maximum number of interest points. In 
our work, a convex hull (red curve) is drawn to delimit 
this interest region as shown in Figure 3(d). 
Morphological operations with a filter are applied on the 
binary image. Dilation is used to enlarge the boundaries 
of the foreground region while filtering is used to fill the 
holes in the interest region (Figure 3(e)). The binary 
mask is then obtained as shown in Figure. 3(f). The latter 
is multiplied by the input real image to separate the 
object from its background (Figure. 3(g)). The boundary 
of the object is finally detected and drawn in the image as 
shown in Figure 3(h). The energy of our segmentation 
method can be expressed as: 

E=- ( argmax(a 2 between {T)) + max(H) +g(x,y)). (13) 



Figure 3. Image segmentation steps. 


6. Experiments 
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(a) (b) (c) (d) 

Figure 4. (a) Input image, (b) Binary mask, (c) Segmented interest object, (d) Object of interest boundary (blue curve). 


To evaluate the performance of the proposed approach, 
we used various pictures from the Berkeley image 
database [29]. In our work, we consider the RGB image, 
as an input image. The RGB image is converted to Lab 
color space. We adopt the CIE L*a*b* color feature here 
since this color space is one of the most widely adopted 
color models for describing colors visible to humans. 
Figure 4 shows the results obtained by the proposed 
approach. The column (a) of this figure represents the 
real input images with heterogeneous characteristics. The 
column (b) represents the binary mask obtained after the 
morphological filter and hole-filling operations. On one 
hand, the binarization method preserves the maximum 
information in the image; on the other hand, the interest 
points allow an efficient characterization of the interest 
region in the binary image. As a result, an efficient object 
segmentation can be obtained as shown in the images 
displayed in column (c). In each case, the obtained result 
shows the efficiency of our proposed approach in terms 
of accuracy of the object segmentation even if the 
heterogeneity appears in the foreground image (second 
and seventh rows of figure 4) or in the background image 
(first and fourth rows of the figure 4) or on both of them 


to reveal some advantages of our proposed approach 
compared to the active contour method. It is found to be 
robust in finding directly the real object boundary 
(column (d) of Figure 4) without any evolution of a curve 
(active contour) as required with the active contour 
method. In fact with this method, an automatic 
initialization of the active contour can be provided by the 
curve presented by the convex hull around the interest 
object (Figure. 3 (d)). Then, the evolution of the active 
contour is made according to an iterative process of 
deformation of this curve until this latter reaches the 
object boundary. 

We have compared the proposed approach with GrabCut 
method [30] which separates object from the background 
in color image under some constraints. The segmentation 
process is realized by defining manually an initial 
rectangle around the object of interest. All the pixels 
belonging to the outside of the rectangle are assumed to 
belong to the background region, and those belonging to 
the inside of the rectangle are assumed to define the 
object region (with a part belonging to the background). 
With an iterative method, the image pixels are finally 
classified into two clusters defining either the object 
region or the background region. 
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Input image GrabCut proposed method 

Figure 5. Comparison of the segmentation results obtained by the 
GrabCut and the proposed method. 

All the algorithms used were implemented in MATLAB 
and tested on a Intel (R) Core (TM) i5-2520M 2.50GHz 
CPU, 4GB Memory, Windows 7. 

It is clear from the results presented in Figure5 that our 
proposed approach shows better performance than the 
grab cut method. Indeed in the five images of figure5 
named, respectively, “rower”, “tiger”, “horses”, 
“mollusk” and ’’echinoderm”, the object of interest is 
separated accurately from the background image using 
our approach. The GrabCut method with an initialization 
around the interest object produces some falsely detected 
background pixels assigned to the foreground (first, third 
and last rows of Figure 5). 



We also compared the energy convergence in term of the 
execution time for both the proposed approach and the 
GrabCut method. The results in Figure 6 concern the 
echinoderm image (the third row of Figure 5) as an 
example of image presenting heterogeneous 
characteristics in both foreground and background image. 
The energy calculated with our approach (red curve) 
converges faster (in 0.4 seconds) than that calculated 
with GrabCut method (blue curve) requiring more time to 
converge (0.8 seconds). (Figure 5) and also in term of 
computation time (Figure 6). 



Figure 6. Execution time of echinoderm image segmentation using 
GrabCut and proposed method. 

A statistical comparison of the segmentation results 
obtained by the proposed method and the GrabCut 
method relying on similarity PSNR (Peak Signal-to- 
Noise Ratio) and MSE (Mean Square Error ) is presented 
in tables 1 and 2. From these tables, our method shows 
that the PSNR measure maintains a high value for all 
images compared to the PSNR measured for the GrabCut 
method. Similarly, lower value of the error (MSE) was 
obtained with our approach compared to the one obtained 
with the grab cut method. 

Consequently, our proposed approach is revealed more 
efficient than the GrabCut method; it provides more 
accuracy in performing correct object segmentation. 


Image 

Metrics 

(db) 

Proposed 

method 

GarbCut 

method 

rower 

PSNR 

26,61 

26,15 

MSE 

0,59 

0,60 

tiger 

PSNR 

28,40 

25,35 

MSE 

0,13 

0,78 

horses 

PSNR 

26,47 

25,51 

MSE 

0,60 

0,75 

mollusk 

PSNR 

25,53 

25,43 

MSE 

0,76 

0,77 

echinoderm 

PSNR 

25,41 

25,12 

MSE 

0,78 

0,83 


Table 1. Performance comparison of GrabCut and the proposed method 
using PSNR and MSE. 


Metrics(db) 

Proposed 

method 

GrabCut 

method 

PSNR 

26,48 

25,51 

MSE 

0,57 

0,75 


Table 2. Average performance of GrabCut and the proposed method 
using PSNR and MSE. 


7. Conclusion 

In this paper, a new approach is proposed for color object 
segmentation. The object is extracted from the 
background image relying on binary image interest points, 
and then the object boundary is detected. The proposed 
method is implemented using local thresholding, interest 
points and morphological operations. Experimental 
results have proven its competitiveness with well-known 
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methods in literature especially with the active contour 

and grab cut methods. In future work, we will extend our 

work to detect and track multiple objects. 
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